This workshop will cover a set of statistical and data analysis skills using the R programming language over two days

Overview

Day 1:

  • R fundamentals and Sourcing and wrangling data
  • Exploratory Data Analysis

Day 2:

  • Interactive team science-based real world application
  • Advanced topics in statistical analysis

Day 1: Morning

Learning objective

To learn fundamental R coding and how to create a reproducible report using R markdown. Then, learn how to source and wrangle data using R in the context of a research question.

Foundational R code

R is mainly used as an object oriented statistical coding language. This means that we read-in or create objects and then perform functions on them. A set of functions are organized into a package. These user developed packages are essential to the success of R as a coding language. At it’s most fundamental state, R is a sophisticated calculator. For example:

2+2 
## [1] 4

Or if we created a list (object) using c() then we could perform the same operation on the list assigned using <- with the name new_list as in:

new_list <- c(2,3,4,5,6,4,5,6,7,7,8)

new_list + 5
##  [1]  7  8  9 10 11  9 10 11 12 12 13

Calculating basic statistics on the new_list using built in R functions. In this code chunk we calculate the mean of the list using the base package. This is specified as package::specific_function.

base::mean(new_list)
## [1] 5.181818
stats::median(new_list)
## [1] 5
base::summary(new_list)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   4.000   5.000   5.182   6.500   8.000

Installing and loading packages

Now individuals such as Hadley Wickham have built a set of packages that work well together. One of the most popular is called the tidyverse. During this workshop we well utilize some of the packages from the tidyverse to import and wrangle our data.

First we must install and load the package for use.

install.packages("tidyverse")

It is most popular to use the function library() to load packages because it allows one to use the “latest and greatest” package each time the document code is executed. It is a better practice to use require() to load packages when document content will be executed often.

library(tidyverse)

Importing Data

Using the readr package from the tidyverse, data with various extensions such as csv, or txt can be used to import data. Shortened file paths "./with_in_project_file_path" can be utilized when the data exists within the current working directory which is usually the same as the project. Here we will be importing data from AirBNB.

airbnb_data <- readr::read_csv("./data/airbnb_listings.csv")

Now that we have read the data let us preview the first 10 rows. In this code chunk you will notice a symbol %>% at the end of each line of code which represents piping. As we wrangle our objects we are simply passing the operations along and given a preview at each step before saveing the finished wrangled object if neccessary.

airbnb_data %>% 
  utils::head(n = 10) %>% 
  knitr::kable() %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::scroll_box(width = "100%", height = "200px")
id name host_id host_name neighbourhood_group neighbourhood latitude longitude room_type price minimum_nights number_of_reviews last_review reviews_per_month calculated_host_listings_count availability_365 number_of_reviews_ltm license
2595 Skylit Midtown Castle 2845 Jennifer Manhattan Midtown 40.75356 -73.98559 Entire home/apt NA 30 49 2022-06-21 0.28 3 300 0 NA
5136 Spacious Family Friendly Duplex w/ Patio + Yard 7378 Rebecca Brooklyn Sunset Park 40.66265 -73.99454 Entire home/apt 215 30 4 2023-08-20 0.03 1 71 1 NA
6848 Only 2 stops to Manhattan studio 15991 Allen & Irina Brooklyn Williamsburg 40.70935 -73.95342 Entire home/apt 81 30 193 2024-05-18 1.05 1 193 3 NA
6872 Uptown Sanctuary w/ Private Bath (Month to Month) 16104 Kae Manhattan East Harlem 40.80107 -73.94255 Private room 65 30 1 2022-06-05 0.04 2 365 0 NA
6990 UES Beautiful Blue Room 16800 Cyn Manhattan East Harlem 40.78778 -73.94759 Private room 65 30 247 2024-03-06 1.38 1 212 2 NA
7064 Amazing location! Wburg. Large, bright & tranquil 17297 Joelle Brooklyn Williamsburg 40.71248 -73.95881 Private room NA 30 13 2022-09-12 0.08 2 0 0 NA
7097 Perfect for Your Parents, With Garden & Patio 17571 Jane Brooklyn Fort Greene 40.69194 -73.97389 Private room 205 2 374 2024-06-02 2.12 2 219 36 OSE-STRREG-0000008
7801 Sunny Williamsburg Loft with Sauna 21207 Chaya Brooklyn Williamsburg 40.71881 -73.95618 Entire home/apt 290 30 12 2023-10-31 0.07 1 219 2 NA
8490 Maison des Sirenes1,bohemian, luminous apartment 25183 Nathalie Brooklyn Bedford-Stuyvesant 40.68456 -73.93963 Entire home/apt 170 30 190 2023-10-16 1.05 2 215 7 NA
9357 Midtown Pied-a-terre 30193 Tommi Manhattan Hell’s Kitchen 40.76724 -73.98664 Entire home/apt 175 30 58 2017-08-13 0.32 1 281 0 NA

The knitr package is also part of the tidyverse suite of packages and used to make rendering to html simple and clean. Packages like kableExtra are support packages that do additional formatting for html rendering.

Wrangling data

Let’s say we wanted to select a view variables that could help us answer the question: What is the most expensive neighborhood for AirBnB listings by room type?

The dplyr package has a set of functions that allow us to effectively wrangle data.

airbnb_data %>% 
  dplyr::select(neighbourhood, room_type, price) %>% 
  dplyr::group_by(neighbourhood, room_type) %>% 
  dplyr::summarize(avg_price = mean(price, na.rm = T)) %>%
  dplyr::ungroup() %>% 
  dplyr::group_by(room_type) %>% 
  dplyr::filter(avg_price == max(avg_price, na.rm = T)) %>% 
  dplyr::arrange(-avg_price) %>% 
  knitr::kable() %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::scroll_box(width = "100%", height = "200px")
neighbourhood room_type avg_price
SoHo Private room 2956.045
Todt Hill Entire home/apt 1070.500
West Village Shared room 1000.000
Washington Heights Hotel room 974.000

Let’s reshape our data set and create new columns. This time we are interested in neighborhoods that have private room prices that are less than the price for entire homes.

airbnb_data %>% 
  dplyr::select(neighbourhood, room_type, price) %>% 
  dplyr::group_by(neighbourhood, room_type) %>% 
  dplyr::summarize(avg_price = mean(price, na.rm = T)) %>%
  dplyr::ungroup() %>% 
  tidyr::pivot_wider(
    names_from = room_type,
    values_from = avg_price
  ) %>% 
  janitor::clean_names() %>% 
  dplyr::mutate(ind1 = dplyr::if_else(private_room < entire_home_apt, 1, 0),
                diff = entire_home_apt - private_room) %>% 
  dplyr::filter(ind1 == 1,
                diff <= 100
                ) %>% 
  arrange(diff) %>% 
  utils::head(n = 10) %>% 
  knitr::kable() %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::scroll_box(width = "100%", height = "200px")
neighbourhood entire_home_apt private_room shared_room hotel_room ind1 diff
Graniteville 102.00000 95.50000 99.00000 NA 1 6.500000
Midtown 388.56869 380.17936 106.83333 582.1358 1 8.389333
Mount Eden 122.50000 111.50000 NA NA 1 11.000000
Stuyvesant Town 237.22727 221.14286 NaN NA 1 16.084416
Dyker Heights 119.16667 102.66667 NA NA 1 16.500000
Kew Gardens Hills 208.07692 189.33333 NA NA 1 18.743590
Unionport 166.85714 148.00000 NA NA 1 18.857143
Fieldston 101.33333 80.80000 NA NA 1 20.533333
Mariners Harbor 95.44444 73.66667 129.50000 NA 1 21.777778
Bedford-Stuyvesant 218.67332 194.77198 75.18182 NA 1 23.901340

Visualizations

Now let’s create a plot of prices by room type and county

airbnb_data %>% 
  dplyr::filter(!is.na(price)) %>% 
  ggplot(aes(x = neighbourhood_group,y = log(price), 
             fill = room_type))+
  geom_boxplot()+
  labs(x = "county", y = "log avg price")

Reproducible reports

When using R markdown there are 3 areas in the document to consider. (i) the YAML controls what the knitted document will look like and also contains information about the title, date, and time the document was made.

---
title: "Hampton University Workshop Guidebook"
output: 
  html_document:
    toc: true
    toc_float: true
---

Afterwards, (ii) empty document space can be used for html or css code. Most often it contains plain text as in all paragraphs in this document is written in this empty space.

Lastly, (iii) code chunks contain calculations that are closely relevant to each other. For example reading and viewing data or creating multiple plots.

# can still make comments in code chunks as long as preceded by a '#' sign

R code goes here 

Here’s a template that can be used to start all reports.

# global default settings for chunks
knitr::opts_chunk$set( eval = T, warning = FALSE, message = FALSE, echo = T,
                      fig.dim = c(6, 3),
                     fig.width = 9,
                     fig.height = 5,
                     dpi = 300,
                     fig.pos = "!h"
                      )

# Load all packages for project
Packages <- c("tidyverse","patchwork")

invisible(lapply(Packages, library, character.only = TRUE))


#theme global setting for ggplot
theme_set(theme_minimal() +
           theme(legend.position = "bottom") +
          theme(plot.title = element_text(hjust = 0.5, size = 12),
               plot.subtitle = element_text(hjust = 0.5, size = 8))
        )

Now that the packages are loaded there is no need to specify which package a function is called from.

Extra

If interested: Here is how to make an interactive geospatial plot using the leaflet package.

pall <- leaflet::colorFactor(
  palette = "viridis",
  domain = airbnb_data$room_type %>% factor())

airbnb_data %>% 
  
  rename("boro" = "neighbourhood_group") %>% 
  filter(boro %in% c("Brooklyn","Manhattan","Queens")) %>% 
  na.omit(price) %>% 
  sample_n(1000) %>% 
  mutate(
    click_label = 
      str_c("<b>$", price, "</b><br>", 
            room_type, "<br>", number_of_reviews, " reviews")) %>% 
  leaflet::leaflet(width = "100%", height = 600) %>% 
  leaflet::addProviderTiles(leaflet::providers$CartoDB.Positron) %>% 
  leaflet::addCircleMarkers(lat = ~latitude, lng = ~longitude, 
                            radius = 5, 
                            color = ~pall(room_type), 
                            popup = ~click_label,
                            stroke = NA) %>% 
  leaflet::addLegend("bottomright", 
            pal = pall, 
            values = ~room_type, 
            title = "Room Type",
            opacity = 1)

Research Question

Research question: Measuring pharmacy access and assessing it’s association with other neighborhood level variables.

Definning Access

Access is a term that refers to multiple dimensions of how one fits with a health system as defined by Penchensky and Thomas in 1981. As stated in the paper, aspects of access were long before investigated but this paper proposed a taxonomic definition that describes 5 dimensions of access as accessibility, availability, affordability, acceptability, and accommodation.

  • Accessibility speaks to ones proximity to an amenity,

  • Availability to volume,

  • Affordability to health insurance and income,

  • Acceptability to social norms when providing services, and

  • Accommodation to appropriateness of utilizing a service.

Disparities in Access

Differential access to amenities continues to be a landmark topic in the disparities discourse. Amenities such as parks, food, and healthcare are among the most popular with respect to built environments such as neighborhoods, (black2010neighborhoods, neckerman2009disparities) and existing disparities are exacerbated with growing demand during crisis (opez2020parks, boz2024investigating).

Disparities in Pharmacy Access

In the context of pharmacy access Guadamuz et.al found that pharmacy closures occurred mostly in minority neighborhoods in the most urban cities of the US between the years of 2007 - 2015 (Guadamuz2021). Given pharmacies serve as an extension of the healthcare system in the US (DiasFernandes2022), it is regarded as an important amenity of a neighborhood not only for medicine but also for food, counseling, and other services that physicians may not be able to provide.

Measuring Pharmacy Access

The most popular way to measure pharmacy access is to use summary statistics of proximity measures, secondly density, and then more sophisticated geospatial smoothing (DiasFernandes2022).

Data sources and study variables

Given the limited data available we will use pharmacy count and pharmacy density from the National Neighborhood Data Archive (NaNDA) and summarize pharmacy access in the state of Virgina. On Day 2, during the interactive session, we will pick other neighborhood level variables from the American Community Survey - 2017 5 year estimates to test their association with pharmacy access in the year 2017.

Variables of interest will be decided on day 2 when we have generate a hypothesis. Here is an example set.

  • Pharmacy count (primary outcome)

  • population count

  • race ethnicity

  • Median Household Income

  • Household Size

  • Rent Burden

  • Median Cost of owned home

  • Mobility

Sourcing data

In this section you will learn how to source neighborhood variables through the American Community Survey (ACS). A census api key is needed in order to pull data from the ACS using an api. Pharmacy data can be directly downloaded from the National Neighborhood Data Archive (NaNDA).

acs_vars <- c(
  # population
  population = "B01001_001",
  #Social economic measures
  median_hh_inc ="B19013_001",
  median_value_owner = "B25097_002",
  
  below_hs = "B12001_002",
  college = "B12001_004",
  edu_tot = "B12001_001",   
  
  age_over_65 = "B01001_005",
  age_tot = "B01001_001",
  
  
  unemployment = "B23025_005",
  une_tot = "B23025_003",
  
  avg_hh_size = "B25010_001",
  
  #Mobility measures
  drive = "B08301_002",
  public_trans = "B08301_010",
  walked = "B08301_019",
  work_home = "B08301_021",
  total_mobile = "B08301_001", # used to calculate pct
  
  # race ethnicity measures 
  non_hisp_white = "B03002_003",
  non_hisp_black = "B03002_004",
  hisp = "B03002_012",
  non_hisp_asian = "B03002_006",
  tot_pop = "B03002_001"
  
  
)

acs_check1 <- str_detect(acs_vars, '_0[0-9][0-9]$')
if (any(acs_check1==FALSE)) {
    cat(acs_vars[which(acs_var_check1==FALSE)], '\n')
    stop('PLEASE REVIEW ACS VARIABLE IDS or add underscore before last 3 numbers')
}
# PULL ACS DATA
############################################
acs_raw <- tidycensus::get_acs(
    survey='acs5',
    variables=acs_vars,
    geography='tract',
    state=c('VA'),
    year=2017, #changed from 2019 to 2020 because new development may have include more island residence.
    key=Sys.getenv('census_key'),
) %>%
    rename_all(str_to_lower) %>%
  select(-moe) %>%
  group_by(geoid, name) %>%
  pivot_wider(
    values_from = "estimate",
    names_from = "variable"
  ) %>% 
   mutate(
     drive_pc = drive/total_mobile,
  public_trans_pc = public_trans/total_mobile,
  walked_pc = walked/total_mobile,
  work_home_pc = work_home/total_mobile,
     
    non_hisp_black_pc = non_hisp_black/tot_pop,
    non_hisp_white_pc = non_hisp_white/tot_pop,
    non_hisp_asian_pc = non_hisp_asian/tot_pop,
    hisp_pc = hisp/tot_pop,
  
  below_hs_pc = below_hs/edu_tot,
  college_pc = college/edu_tot,
  
  age_over_65_pc = age_over_65/population,
  age_65_under_pc = 1-age_over_65_pc,
  
  unemployment_pc = unemployment/une_tot
  ) %>% 
  ungroup()

Morning Review

In this section we learned foundation R syntax and used popular packages to import and wrangle data. We sourced data from the ACS website and now have a template for creating reproducible reports with R markdown.

Additional resources

If you’d like to learn more about what we talked about during this session here are a few resources that go into more detail.

  • An intro to R can be very useful to get a handling on the syntax but working examples can be even more useful and you can get a list of that here

  • We touched on a few packages used from the Tidyverse, but there are much more packages detailed in the link that do very complex operations on data sets

Day 1: Afternoon

In this section we will discuss a work flow and common packages used for exploratory data analysis (EDA).

Work Flow

When conducting exploratory data analysis

  • The first step is to determine an initial set of questions about the data
  • Secondly, create a table of summary statistics and visualize the distribution of your variables. This is considered univariate analysis.
  • Thirdly, bivariate analysis are used to understand how the predictors correlate together with the goal to identify themes from the data and may contian hypothesis testing.
  • Lastly, basic bivariate statistical models can be used to determine the magnitude of association

Common packages

There are a few packages essential for EDA; namely: ggplot, arsenal, patchwork, ggally, sf, and tigris.

  • As we have seen ggplot handles almost all visualization needs and it’s themes can be set at the beginning of the document for consistent visualization. The package patchwork allows for plots to be shown in a grid like layout

  • For ease of table creation that includes descriptive statistics and hypothesis test results, the arsenal packages handles that and more

  • For quick bivariate comparisons, ggally creates a matrix of both visualizations and tests.

  • Lastly, sf and tigris will be used to create our geospatial plots. Not so common to EDA but to spatial analysis spdep and rgeoda are very popular.

Example

Here is an example of an EDA conducted using the variables with respect to our research question. First let’s create a table of our variables of interest using the arsenal package.

Univariate analysis

Let’s define our controls for our table.

my_controls <- arsenal::tableby.control(
  test = T,
  total = T,
  numeric.stats = c("N", "meansd", "medianq1q3", "range", "Nmiss2"),
  cat.stats = c("countpct", "Nmiss2"),
  stats.labels = list(
    meansd = "Mean (SD)",
    medianq1q3 = "Median (Q1, Q3)",
    range = "Min - Max",
    Nmiss2 = "Missing"
  )
)



tableby(~., data = acs_raw %>% 
          select(drive_pc:hisp_pc)) %>% 
  summary(text = TRUE)%>% 
  knitr::kable() %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::scroll_box(width = "100%", height = "200px")
Overall (N=1907)
drive_pc
  • N-Miss
32
  • Mean (SD)
0.868 (0.109)
  • Range
0.280 - 1.000
public_trans_pc
  • N-Miss
32
  • Mean (SD)
0.042 (0.071)
  • Range
0.000 - 0.556
walked_pc
  • N-Miss
32
  • Mean (SD)
0.026 (0.052)
  • Range
0.000 - 0.548
work_home_pc
  • N-Miss
32
  • Mean (SD)
0.047 (0.034)
  • Range
0.000 - 0.439
non_hisp_black_pc
  • N-Miss
28
  • Mean (SD)
0.201 (0.214)
  • Range
0.000 - 0.973
non_hisp_white_pc
  • N-Miss
28
  • Mean (SD)
0.627 (0.241)
  • Range
0.000 - 1.000
non_hisp_asian_pc
  • N-Miss
28
  • Mean (SD)
0.056 (0.079)
  • Range
0.000 - 0.613
hisp_pc
  • N-Miss
28
  • Mean (SD)
0.083 (0.096)
  • Range
0.000 - 0.766

Let’s create visualizations.

What is the distribution of mobility in VA?

acs_raw %>% 
  ggplot()+
  geom_histogram(aes(x = drive_pc, fill = "Drive"),alpha = 0.3)+
  geom_histogram(aes(x = walked_pc, fill = "Walk"), alpha = 0.3)+
  geom_histogram(aes(x = public_trans_pc, fill = "Public Transport"), alpha = 0.3)+
  labs(x = "Percent of census tracts")+
  scale_fill_brewer(palette = "Set3")

Bivariate analysis

How does the distribution of race ethnicity correlate with mobility?

cor.test(acs_raw$drive_pc, acs_raw$non_hisp_black_pc, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  acs_raw$drive_pc and acs_raw$non_hisp_black_pc
## S = 1035548155, p-value = 0.01289
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## 0.05742079
 acs_raw %>% 
  ggplot()+
  geom_point(aes(x = drive_pc, y = non_hisp_black_pc))+
     acs_raw %>% 
  ggplot()+
  geom_point(aes(x = public_trans_pc, y = non_hisp_black_pc))

Or we could use GGally to look at many comparisons at once for our bivariate analysis. In the scatter plots the dots represent census tracts.

acs_raw %>% 
  select(non_hisp_white_pc, non_hisp_black_pc, non_hisp_asian_pc, hisp_pc, drive_pc) %>% 
GGally::ggpairs()

How does this look geospatially?

invisible(capture.output({
tracts = tigris::tracts(state = "VA", year = 2017, cb = T)
}))
va_acs = tigris::geo_join(
  tracts,
  acs_raw,
  by_sp = "GEOID",
  by_df = "geoid",
  how = "inner"
)

va_acs %>% 
ggplot()+
  geom_sf(aes(fill = non_hisp_black_pc  ))+
  scale_fill_viridis_b()+
va_acs %>% 
ggplot()+
  geom_sf(aes(fill = drive_pc ))+
  scale_fill_viridis_b()

How does pharmacy access relate to race-ethnicity?

pharm_data = read_csv("data/nanda_healthcare_tract_2003-2017_03P.csv") %>% 
  filter(year == 2017) %>% 
  select(count_446110, tract_fips10, popden_446110)

virginia_pharm = tigris::geo_join(
  va_acs,
  pharm_data ,
  by_sp = "GEOID",
  by_df = "tract_fips10",
  how = "inner"
)

virginia_pharm %>% 
  sf::st_drop_geometry() %>% 
  ggplot()+
  geom_histogram(aes(x=count_446110 ), fill = "red", alpha = 0.3)

virginia_pharm %>% 
  sf::st_drop_geometry() %>% 
  ggplot()+
  geom_histogram(aes(x=popden_446110 ), fill = "red", alpha = 0.3)

virginia_pharm %>% 
  sf::st_drop_geometry() %>% 
  select(non_hisp_white_pc, non_hisp_black_pc, non_hisp_asian_pc, hisp_pc, popden_446110) %>% 
GGally::ggpairs()

Geospatially?

virginia_pharm %>% 
ggplot()+
  geom_sf(aes(fill = count_446110  ))+
  scale_fill_viridis_b()

virginia_pharm %>% 
ggplot()+
  geom_sf(aes(fill = count_446110 %>% log()  ))+
  scale_fill_viridis_b(name = "Log Pharmacy Count")

virginia_pharm %>% 
ggplot()+
  geom_sf(aes(fill = count_446110 %>% log()  ))+
  scale_fill_viridis_b(name = "Log Pharmacy Count") +
virginia_pharm %>% 
ggplot()+
  geom_sf(aes(fill = non_hisp_black_pc  ))+
  scale_fill_viridis_b()

virginia_pharm %>% 
ggplot()+
  geom_sf(aes(fill = count_446110 %>% log()  ))+
  scale_fill_viridis_b(name = "Log Pharmacy Count") +
virginia_pharm %>% 
ggplot()+
  geom_sf(aes(fill = non_hisp_white_pc  ))+
  scale_fill_viridis_b()

Levels of pharmacy access

It turns out there are lots of zeros in the data and so we can only get 3 categories. This is largely exploratory and the pvalues are not necessarily comparable give that have not accounted for multiple comparisons. We are using a non-parametric test to assess the median difference in racial ethinic population percentages across neighborhood pharmacy count levels.

tableby(pharm_levels ~ count_446110+non_hisp_black_pc + non_hisp_white_pc + non_hisp_asian_pc+ hisp_pc, data = virginia_pharm %>%
          sf::st_drop_geometry() %>% 
          mutate(pharm_levels = case_when(
            count_446110 <= quantile(count_446110, 0.25) ~ 1,
            count_446110 > quantile(count_446110, 0.25) & count_446110 <= quantile(count_446110, 0.50)~ 2,
            count_446110 > quantile(count_446110, 0.50) & count_446110 <= quantile(count_446110, 0.75) ~ 3,
            count_446110 > quantile(count_446110, 0.75) ~ 4
          )), control = my_controls)%>% 
  summary(text = TRUE)%>% 
  knitr::kable() %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::scroll_box(width = "100%", height = "200px")
1 (N=1083) 3 (N=429) 4 (N=388) Total (N=1900) p value
count_446110 < 0.001
  • N
1083 429 388 1900
  • Mean (SD)
0.000 (0.000) 1.000 (0.000) 2.554 (0.895) 0.747 (1.078)
  • Median (Q1, Q3)
0.000 (0.000, 0.000) 1.000 (1.000, 1.000) 2.000 (2.000, 3.000) 0.000 (0.000, 1.000)
  • Min - Max
0.000 - 0.000 1.000 - 1.000 2.000 - 7.000 0.000 - 7.000
  • Missing
0 0 0 0
non_hisp_black_pc 0.005
  • N
1062 429 388 1879
  • Mean (SD)
0.214 (0.232) 0.193 (0.195) 0.174 (0.177) 0.201 (0.214)
  • Median (Q1, Q3)
0.133 (0.046, 0.286) 0.127 (0.057, 0.254) 0.108 (0.048, 0.241) 0.127 (0.049, 0.272)
  • Min - Max
0.000 - 0.973 0.000 - 0.917 0.000 - 0.915 0.000 - 0.973
  • Missing
21 0 0 21
non_hisp_white_pc 0.008
  • N
1062 429 388 1879
  • Mean (SD)
0.613 (0.252) 0.631 (0.234) 0.657 (0.216) 0.627 (0.241)
  • Median (Q1, Q3)
0.651 (0.447, 0.820) 0.670 (0.490, 0.828) 0.686 (0.514, 0.830) 0.664 (0.469, 0.822)
  • Min - Max
0.000 - 1.000 0.043 - 0.995 0.033 - 0.999 0.000 - 1.000
  • Missing
21 0 0 21
non_hisp_asian_pc 0.959
  • N
1062 429 388 1879
  • Mean (SD)
0.056 (0.081) 0.057 (0.076) 0.055 (0.078) 0.056 (0.079)
  • Median (Q1, Q3)
0.022 (0.003, 0.076) 0.027 (0.005, 0.084) 0.026 (0.006, 0.074) 0.024 (0.004, 0.077)
  • Min - Max
0.000 - 0.543 0.000 - 0.504 0.000 - 0.613 0.000 - 0.613
  • Missing
21 0 0 21
hisp_pc 0.873
  • N
1062 429 388 1879
  • Mean (SD)
0.082 (0.094) 0.085 (0.100) 0.082 (0.097) 0.083 (0.096)
  • Median (Q1, Q3)
0.052 (0.021, 0.107) 0.052 (0.023, 0.106) 0.052 (0.023, 0.103) 0.052 (0.022, 0.106)
  • Min - Max
0.000 - 0.699 0.000 - 0.600 0.000 - 0.766 0.000 - 0.766
  • Missing
21 0 0 21

During day’s 2 session we will cover how may might handle such findings within the model.

Modeling

Testing the association between pharmacy count and racial ethnic groups. Typically, count data follows a poisson distribution, therefore it is modeled as such. Let, \(Y_i \sim Poi(\lambda_i)\) be a random variable that represents the number of pharmacies in a census tract \(i \in 1...n\). The rate is denoted as \(\lambda_i\) and the average pharmacy rate across all census tracts is represented as \(\sum Y_i/n =\lambda\)

# to choose which model is best to account for overdispersion 
glm(count_446110 ~1, data = virginia_pharm %>% sf::st_drop_geometry(), family = poisson()) %>% 
  summary()
## 
## Call:
## glm(formula = count_446110 ~ 1, family = poisson(), data = virginia_pharm %>% 
##     sf::st_drop_geometry())
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2226  -1.2226  -1.2226   0.2777   4.3375  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.29120    0.02654  -10.97   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 2789.6  on 1899  degrees of freedom
## Residual deviance: 2789.6  on 1899  degrees of freedom
## AIC: 4735.6
## 
## Number of Fisher Scoring iterations: 6

A strong consideration when modeling count data is dispersion. Specifically, overdispersion occurs when the variance of the data is larger than the mean. There are formal tests determining when count data are overdispersed, however we can compare the AIC, which represents a model’s goodness of fit, of a poisson or negative binomial model. The negative binomial model includes and additional parameter to adjust for the overdispersion.

MASS::glm.nb(count_446110 ~1, data = virginia_pharm %>% sf::st_drop_geometry()) %>% 
  summary()
## 
## Call:
## MASS::glm.nb(formula = count_446110 ~ 1, data = virginia_pharm %>% 
##     sf::st_drop_geometry(), init.theta = 1.161837478, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0743  -1.0743  -1.0743   0.2123   2.7577  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.29120    0.03402   -8.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(1.1618) family taken to be 1)
## 
##     Null deviance: 1814.5  on 1899  degrees of freedom
## Residual deviance: 1814.5  on 1899  degrees of freedom
## AIC: 4534.7
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  1.162 
##           Std. Err.:  0.123 
## 
##  2 x log-likelihood:  -4530.724

Now that we see that the negative binomial model has a better fit of our pharmacy count data we will adjust for social variables. This is mainly exploratory and more statistical care is needed such as multiple comparison which suggest we make the pvalue of statistical significance stricter given we are performing multiple tests with the same data.

MASS::glm.nb(count_446110 ~non_hisp_black_pc, data = virginia_pharm %>% sf::st_drop_geometry()) %>% 
  summary()
## 
## Call:
## MASS::glm.nb(formula = count_446110 ~ non_hisp_black_pc, data = virginia_pharm %>% 
##     sf::st_drop_geometry(), init.theta = 1.223436363, link = log)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.132  -1.105  -1.005   0.254   2.691  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -0.17112    0.04603  -3.717 0.000201 ***
## non_hisp_black_pc -0.57604    0.16882  -3.412 0.000644 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(1.2234) family taken to be 1)
## 
##     Null deviance: 1818.0  on 1878  degrees of freedom
## Residual deviance: 1806.6  on 1877  degrees of freedom
##   (21 observations deleted due to missingness)
## AIC: 4500.9
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  1.223 
##           Std. Err.:  0.133 
## 
##  2 x log-likelihood:  -4494.899
MASS::glm.nb(count_446110 ~non_hisp_white_pc, data = virginia_pharm %>% sf::st_drop_geometry()) %>% 
  summary()
## 
## Call:
## MASS::glm.nb(formula = count_446110 ~ non_hisp_white_pc, data = virginia_pharm %>% 
##     sf::st_drop_geometry(), init.theta = 1.22385824, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1621  -1.0967  -1.0084   0.2835   2.6372  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -0.5925     0.0984  -6.021 1.73e-09 ***
## non_hisp_white_pc   0.4883     0.1435   3.402 0.000668 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(1.2239) family taken to be 1)
## 
##     Null deviance: 1818.2  on 1878  degrees of freedom
## Residual deviance: 1807.0  on 1877  degrees of freedom
##   (21 observations deleted due to missingness)
## AIC: 4501.1
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  1.224 
##           Std. Err.:  0.133 
## 
##  2 x log-likelihood:  -4495.145

What about spatial correlation? Local Indicators of spatial association are used to determine if spatial dependence exist in the data. A common indicator to use is Moran’s index.

While Global Moran’s I is a global statistic for spatial randomness, Local Moran’s I (MI) estimates the tendency of census tracts to cluster or be spatial outliers; the null being spatial randomness, and significant pvalues suggesting spatial correlation is not random. Positive values of local MI values suggest the neighbors are high or low clustering, while negative local MI values suggest spatial outliers where a low valued census tract compared to the mean has neighbors with high values and vice versa. For every census tract a 0,1 weighted matrix denoted as \(w_{ij}\) is used to identify their neighbors based on a proximity matrix such as Queens contiguity (where neighbors share a border or point) in our case as shown below:

\[ W_{ij}=\begin{vmatrix} 0 &0&1&0&0\\ 0&1&i&1&0\\ 0&0&1&0&0 \end{vmatrix} \]

The Moran Score for each census tract is denoted as \(I_i\). Standardized values for a census tract \(z_i\) are compared with \(z_j\). Lastly, c. is a constant being the sum of all values.

\[z_i = \frac{x_i - \mu}{\sigma} \;,\; z_j = \frac{x_j-\mu}{\sigma} \; and \; c. = \sum_iz_i^{-2}\]

\[ I_i = c.z_i\sum_jw_{ij}z_j \]

spdep::moran.test(virginia_pharm$count_446110,
                  spdep::nb2listw(
                    spdep::include.self(
                      spdep::poly2nb(
                        virginia_pharm, queen = T))) , 
                  alternative = "greater")
## 
##  Moran I test under randomisation
## 
## data:  virginia_pharm$count_446110  
## weights: spdep::nb2listw(spdep::include.self(spdep::poly2nb(virginia_pharm,     
## virginia_pharm$count_446110  
## weights:     queen = T)))    
## 
## Moran I statistic standard deviate = 11.733, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      0.1464311483     -0.0005265929      0.0001568835
qw <- rgeoda::queen_weights(virginia_pharm) 


moran <- rgeoda::local_moran(qw, virginia_pharm["count_446110"], cpu_threads = 4)

plot(sf::st_geometry(virginia_pharm), 
     col=sapply(rgeoda::lisa_clusters(moran, cutoff = 0.05),    function(x){return(rgeoda::lisa_colors(moran)[[x+1]])}), 
     border = "#333333", lwd=0.2)

title(main = "Local Moran Map of Pharmacy Count")
legend('bottomleft', legend = rgeoda::lisa_labels(moran), 
       fill = rgeoda::lisa_colors(moran), border = "#eeeeee")

Given that their is proof of spatial correlation, additional statistical care is needed to test associations while adjusting for spatial dependence. This will be covered in Day 2 during our special topics section.

Evening Review

In this section we learned how to do exploratory data analysis using popular packages and now have a more robust template for future working documents.

Additional resources

If you’d like to learn more about what we talked about during this session here are a few resources that go into more detail.

  • Exploratory data analysis is the initial step that one takes before doing any modeling and therefore it will be great to get very familiar with the flow and packages that bring ease to some of its tedious aspects. Here’s an additional resource: (EDA)

  • Deeper take on hypothesis testing overview and book

  • More detail on dispersion in count data ref

Day 2: Morning

Geospatially modeling pharmacy access adjusting for other census tract level social variables

Research question: Measuring pharmacy access and assessing it’s association with other neighborhood level variables.

Revisiting Access

Access is a term that refers to multiple dimensions of how one fits with a health system as defined by Penchensky and Thomas in 1981.

  • Accessibility speaks to ones proximity to an amenity, contiguity

  • Availability to volume, adjusting for population

  • Affordability to health insurance and income, thoughts?

  • Acceptability to social norms when providing services, and thoughts?

  • Accommodation to appropriateness of utilizing a service. thoughts?

Asseing pharmacy access

Calculating Standardized Incidence Ratio

The standardized incidence ratio is a comparison of the observed data and the expected. Ratio above 1 suggests the rate of pharmacy is higher than expected while values below 1 sugggests lower than expect. In order to calculate the expect rate adjusted by population the following formula is used:

\[E_i = \frac{total_{count}}{total_{pop}}*pop_i\]

acs_pharm_data = virginia_pharm %>% 
  
  rename("pharmacy_count" = "count_446110") %>% 
  mutate(total_population = sum(population, na.rm = T),
         total_pharm = sum(pharmacy_count, na.rm = T),
         E = population*total_pharm/total_population,
         SIR = pharmacy_count/E) 

acs_pharm_data %>% 
  sf::st_drop_geometry() %>% 
  ggplot()+
  geom_histogram(aes(x=pharmacy_count ), fill = "red", alpha = 0.3)+
  geom_histogram(aes(x=E ), fill = "blue", alpha = 0.3)+
  acs_pharm_data %>% 
  sf::st_drop_geometry() %>% 
  ggplot()+
  geom_histogram(aes(x=SIR ), fill = "green", alpha = 0.3)

Geospatially we can visualize the areas that have lower than expected pharmacy count being r“purple”` and areas that are brighter colors representing pharmacy counts that are higher than expect.

acs_pharm_data %>% 
  ggplot()+
  geom_sf(aes(fill = pharmacy_count %>% log()))+
  scale_fill_viridis_b(name = "Log Pharmacy Count")+
acs_pharm_data %>% 
  ggplot()+
  geom_sf(aes(fill = SIR %>% log() ))+
  scale_fill_viridis_b(name = "Log SIR")

Building a hierarchical model

Let \(Y_i \sim Poisson(\eta_i )\) be the distribution of the number of pharmacies in a given census tract with out considering spatial dependence where \(\eta_i\) is the rate for each census tract \(i\). Reparameterization for the Bayesian framework \(\eta_i = E_i \theta_i\) where \(E\) is the expected pharmacy count based on population, and \(\theta\) is the parameter of interest to estimate the rate of pharmacies.

\[log(\theta) = \beta_0+\beta_{1p} X+b\]

where \(\beta_0\) is the model intercept, \(\beta_{1p}\) is the coefficients for the \(p\) covariates in design matrix \(X\), and \(b\) is the spatial random effect following BYM2 (Besag, York, and Mollie) specifications:

$$ \[\begin{aligned} b &=((\sqrt{\rho})\mu+(\sqrt{1-\rho})v)\frac{1}{\tau_b} \\ \mu_i | \mu_j &\sim N( \sum_j\frac{W_{ij}\mu_j}{n_j}, 1), n_j = {u_{i\in j}},\\ v_i &\sim N(0,1) \\ \rho &\in (0,1) \\ \tau_b &\sim PC(U,\alpha) \\ \end{aligned}\]

$$

where the spatial component \(\mu\) and non spatial component \(v\) are scaled to have variance \(1\).

I started first by confirming that spatial correlation was present by comparing a model fit with no spatial random effect with another that includes a spatial random effect.

# Creating network graph

nb <- spdep::poly2nb(acs_pharm_data, queen = T )

spdep::nb2INLA("map.adj", nb)

g <- INLA::inla.read.graph(filename = "map.adj")

# creating spatial id's
acs_pharm_data$re_u <- 1:nrow(acs_pharm_data) 

model <- INLA::inla(pharmacy_count ~ 1,
                  family = "poisson", 
                  E= E,
                  data = as.data.frame(acs_pharm_data), 
                  control.predictor = list(compute = TRUE
                                          # ,link = 1
                                           ),
                  control.compute = list(dic = TRUE,
                                         cpo = T,
                                         config = T))

modela <- INLA::inla(pharmacy_count ~ 1 + f(re_u, model = "bym2", constr = T,
                                          scale.model = T,
                                          graph = g#, 
                                          #hyper = prior
                                          ),
                  family = "poisson", 
                  E= E,
                  data = as.data.frame(acs_pharm_data), 
                  control.predictor = list(compute = TRUE,
                                           link = 1),
                  control.compute = list(dic = TRUE,
                                         cpo = T,
                                         config = T))

#saveRDS(model, "inla_model.rds")
#saveRDS(modela, "inla_modela.rds")
model <- readRDS("inla_model.rds")
modela <- readRDS("inla_modela.rds")
#INLA::inla.priors.used(model)
model$dic$dic
## [1] 4603.244
modela$dic$dic
## [1] 4295.739
acs_pharm_data$RR_mod1 <- modela$summary.fitted.values[, "mean"]
acs_pharm_data$RR_mod1_HH <- modela$summary.fitted.values[, "0.975quant"]
acs_pharm_data$RR_mod1_LL <- modela$summary.fitted.values[, "0.025quant"]

ggplot(sf::st_as_sf(acs_pharm_data)) + 
  geom_sf(aes(fill = RR_mod1 %>% log()),size = 0.01 ) +
  #scale_fill_viridis_c(option = "A", direction = -1)+
  scale_fill_gradient2(midpoint = 0, low = "blue", mid = "white", high = "red", name = "",na.value = "#FFFFFF")+
  # ggplot2::guides(fill=guide_legend(title = ""))+
 #   theme_bw()+
  theme(axis.text   = element_blank(),
         legend.position = c(0.2,0.8))+

  sf::st_as_sf(acs_pharm_data %>% 
                   mutate(ind = case_when(RR_mod1_HH > 1 & RR_mod1_LL> 1 ~ 2,
                                         RR_mod1_HH < 1 & RR_mod1_LL< 1 ~ 1,
                                         RR_mod1_HH > 1 & RR_mod1_LL< 1 ~ 0),
                         RR_mod1 = ifelse(ind == 0, NA,RR_mod1)) ) %>% 
ggplot() + 
  geom_sf(aes(fill = RR_mod1 %>% log()),size = 0.01 ) +
  #scale_fill_viridis_c(option = "A", direction = -1)+
  scale_fill_gradient2(midpoint = 0, low = "blue", mid = "white", high = "red", name = "",na.value = "#FFFFFF")+
  # ggplot2::guides(fill=guide_legend(title = ""))+
 #   theme_bw()+
  theme(axis.text   = element_blank(),
         legend.position = c(0.2,0.8))

The lower DIC value for the model adjusting for spatial correlation suggests that it is a better model. Next we will look at the distribution of our social variables with respect to pharmacy access levels.

Table of variable distribution by average and high pharmacy access neighborhoods.

tableby(ind~pharmacy_count  + population+ drive_pc +
  public_trans_pc +
  walked_pc +
  work_home_pc +
     
    non_hisp_black_pc +
    non_hisp_white_pc +
    non_hisp_asian_pc +
    hisp_pc +
  
  below_hs_pc +
  college_pc +
  
  age_over_65_pc +
  age_65_under_pc +
  
  unemployment_pc ,  data = acs_pharm_data %>% 
                   mutate(ind = case_when(RR_mod1_HH > 1 & RR_mod1_LL> 1 ~ 2,
                                         RR_mod1_HH < 1 & RR_mod1_LL< 1 ~ 1,
                                         RR_mod1_HH > 1 & RR_mod1_LL< 1 ~ 0),
                         RR_mod1 = ifelse(ind == 0, NA,RR_mod1)), control = my_controls_row ) %>% summary(text = TRUE)%>% 
  knitr::kable() %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::scroll_box(width = "100%", height = "200px")
0 (N=1875) 2 (N=25) Total (N=1900) p value
pharmacy_count < 0.001
  • N
1875 25 1900
  • Mean (SD)
0.691 (0.962) 5.000 (0.816) 0.747 (1.078)
  • Median (Q1, Q3)
0.000 (0.000, 1.000) 5.000 (4.000, 5.000) 0.000 (0.000, 1.000)
  • Min - Max
0.000 - 5.000 4.000 - 7.000 0.000 - 7.000
  • Missing
0 0 0
population 0.450
  • N
1875 25 1900
  • Mean (SD)
4406.850 (1941.796) 4124.320 (1533.838) 4403.133 (1936.932)
  • Median (Q1, Q3)
4186.000 (3038.000, 5562.500) 3862.000 (3094.000, 4865.000) 4183.000 (3043.000, 5555.750)
  • Min - Max
0.000 - 15271.000 1631.000 - 8072.000 0.000 - 15271.000
  • Missing
0 0 0
drive_pc 0.690
  • N
1850 25 1875
  • Mean (SD)
0.868 (0.109) 0.833 (0.148) 0.868 (0.109)
  • Median (Q1, Q3)
0.900 (0.836, 0.936) 0.910 (0.728, 0.936) 0.900 (0.836, 0.936)
  • Min - Max
0.280 - 1.000 0.480 - 0.977 0.280 - 1.000
  • Missing
25 0 25
public_trans_pc 0.305
  • N
1850 25 1875
  • Mean (SD)
0.042 (0.071) 0.040 (0.073) 0.042 (0.071)
  • Median (Q1, Q3)
0.013 (0.000, 0.052) 0.006 (0.000, 0.011) 0.013 (0.000, 0.052)
  • Min - Max
0.000 - 0.556 0.000 - 0.247 0.000 - 0.556
  • Missing
25 0 25
walked_pc 0.016
  • N
1850 25 1875
  • Mean (SD)
0.025 (0.051) 0.057 (0.092) 0.026 (0.052)
  • Median (Q1, Q3)
0.010 (0.001, 0.026) 0.023 (0.005, 0.045) 0.010 (0.001, 0.027)
  • Min - Max
0.000 - 0.548 0.000 - 0.362 0.000 - 0.548
  • Missing
25 0 25
work_home_pc 0.324
  • N
1850 25 1875
  • Mean (SD)
0.047 (0.034) 0.051 (0.030) 0.047 (0.034)
  • Median (Q1, Q3)
0.041 (0.022, 0.064) 0.045 (0.031, 0.069) 0.041 (0.022, 0.064)
  • Min - Max
0.000 - 0.439 0.000 - 0.119 0.000 - 0.439
  • Missing
25 0 25
non_hisp_black_pc 0.099
  • N
1854 25 1879
  • Mean (SD)
0.202 (0.215) 0.116 (0.102) 0.201 (0.214)
  • Median (Q1, Q3)
0.128 (0.050, 0.272) 0.087 (0.046, 0.162) 0.127 (0.049, 0.272)
  • Min - Max
0.000 - 0.973 0.004 - 0.420 0.000 - 0.973
  • Missing
21 0 21
non_hisp_white_pc 0.011
  • N
1854 25 1879
  • Mean (SD)
0.625 (0.242) 0.751 (0.153) 0.627 (0.241)
  • Median (Q1, Q3)
0.662 (0.466, 0.821) 0.764 (0.703, 0.846) 0.664 (0.469, 0.822)
  • Min - Max
0.000 - 1.000 0.410 - 0.984 0.000 - 1.000
  • Missing
21 0 21
non_hisp_asian_pc 0.931
  • N
1854 25 1879
  • Mean (SD)
0.056 (0.080) 0.048 (0.065) 0.056 (0.079)
  • Median (Q1, Q3)
0.024 (0.004, 0.077) 0.025 (0.007, 0.079) 0.024 (0.004, 0.077)
  • Min - Max
0.000 - 0.613 0.000 - 0.252 0.000 - 0.613
  • Missing
21 0 21
hisp_pc 0.401
  • N
1854 25 1879
  • Mean (SD)
0.083 (0.097) 0.056 (0.045) 0.083 (0.096)
  • Median (Q1, Q3)
0.052 (0.022, 0.106) 0.041 (0.027, 0.066) 0.052 (0.022, 0.106)
  • Min - Max
0.000 - 0.766 0.000 - 0.170 0.000 - 0.766
  • Missing
21 0 21
below_hs_pc 0.738
  • N
1854 25 1879
  • Mean (SD)
0.486 (0.056) 0.493 (0.065) 0.486 (0.056)
  • Median (Q1, Q3)
0.484 (0.461, 0.506) 0.485 (0.453, 0.515) 0.484 (0.461, 0.506)
  • Min - Max
0.000 - 1.000 0.375 - 0.649 0.000 - 1.000
  • Missing
21 0 21
college_pc 0.136
  • N
1854 25 1879
  • Mean (SD)
0.259 (0.068) 0.245 (0.061) 0.259 (0.067)
  • Median (Q1, Q3)
0.273 (0.223, 0.307) 0.253 (0.196, 0.292) 0.273 (0.223, 0.307)
  • Min - Max
0.000 - 0.403 0.100 - 0.363 0.000 - 0.403
  • Missing
21 0 21
age_over_65_pc 0.008
  • N
1854 25 1879
  • Mean (SD)
0.031 (0.015) 0.025 (0.019) 0.031 (0.015)
  • Median (Q1, Q3)
0.030 (0.021, 0.041) 0.019 (0.014, 0.029) 0.030 (0.021, 0.041)
  • Min - Max
0.000 - 0.091 0.000 - 0.069 0.000 - 0.091
  • Missing
21 0 21
age_65_under_pc 0.008
  • N
1854 25 1879
  • Mean (SD)
0.969 (0.015) 0.975 (0.019) 0.969 (0.015)
  • Median (Q1, Q3)
0.970 (0.959, 0.979) 0.981 (0.971, 0.986) 0.970 (0.959, 0.979)
  • Min - Max
0.909 - 1.000 0.931 - 1.000 0.909 - 1.000
  • Missing
21 0 21
unemployment_pc 0.216
  • N
1850 25 1875
  • Mean (SD)
0.058 (0.039) 0.065 (0.037) 0.058 (0.039)
  • Median (Q1, Q3)
0.050 (0.032, 0.074) 0.063 (0.036, 0.082) 0.050 (0.032, 0.074)
  • Min - Max
0.000 - 0.327 0.016 - 0.166 0.000 - 0.327
  • Missing
25 0 25

Adjusting for social variables

model2 <- INLA::inla(pharmacy_count ~ drive_pc +median_hh_inc +below_hs_pc +
  college_pc + age_over_65_pc + unemployment_pc + f(re_u, model = "bym2", constr = T,
                                          scale.model = T,
                                          graph = g#, 
                                          #hyper = prior
                                          ),
                  family = "poisson", 
                  E= E,
                  data = as.data.frame(acs_pharm_data), 
                  control.predictor = list(compute = TRUE,
                                           link = 1),
                  control.compute = list(dic = TRUE,
                                         cpo = T,
                                         config = T))

# saveRDS(model2, "inla_model2.rds")
model2 <- readRDS("inla_model2.rds")

model2$summary.fixed %>%
  knitr::kable() %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::scroll_box(width = "100%", height = "200px")
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 1.5524432 0.5088196 0.5535944 1.5527052 2.5498094 1.5532283 0
drive_pc 0.3921986 0.4151741 -0.4218401 0.3920505 1.2070837 0.3917629 0
median_hh_inc 0.0000011 0.0000015 -0.0000017 0.0000011 0.0000040 0.0000011 0
below_hs_pc -2.3969757 0.7018009 -3.7747984 -2.3965338 -1.0216560 -2.3956537 0
college_pc -1.4515916 0.8095431 -3.0415995 -1.4508622 0.1342867 -1.4494084 0
age_over_65_pc -14.1023051 2.5429199 -19.0952640 -14.1004829 -9.1196579 -14.0968442 0
unemployment_pc -1.7813918 1.0320347 -3.8068342 -1.7809921 0.2417878 -1.7801948 0

Adjusting for race_ethnicity

model3 <- INLA::inla(pharmacy_count ~  below_hs_pc + age_over_65_pc +  non_hisp_black_pc +
    non_hisp_asian_pc +
    hisp_pc   + f(re_u, model = "bym2", constr = T,
                                          scale.model = T,
                                          graph = g#, 
                                          #hyper = prior
                                          ),
                  family = "poisson", 
                  E= E,
                  data = as.data.frame(acs_pharm_data), 
                  control.predictor = list(compute = TRUE,
                                           link = 1),
                  control.compute = list(dic = TRUE,
                                         cpo = T,
                                         config = T))
 #saveRDS(model3, "inla_model3.rds")
model3 <- readRDS("inla_model3.rds")

model3$summary.fixed %>% 
  knitr::kable() %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::scroll_box(width = "100%", height = "200px")
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 1.7528588 0.3700598 1.0262775 1.7531160 2.4779909 1.7536354 0
below_hs_pc -2.9017122 0.7214740 -4.3182868 -2.9012041 -1.4880100 -2.9001867 0
age_over_65_pc -15.9760566 2.3384438 -20.5650973 -15.9752735 -11.3914527 -15.9737139 0
non_hisp_black_pc -0.5104305 0.1963984 -0.8956691 -0.5104553 -0.1250489 -0.5105019 0
non_hisp_asian_pc -0.0940878 0.5344479 -1.1338354 -0.0971959 0.9631976 -0.1034178 0
hisp_pc 0.0484742 0.4052122 -0.7430814 0.0473430 0.8464207 0.0450768 0
acs_pharm_data$RR_mod3 <- model3$summary.fitted.values[, "mean"]
acs_pharm_data$RR_mod3_HH <- model3$summary.fitted.values[, "0.975quant"]
acs_pharm_data$RR_mod3_LL <- model3$summary.fitted.values[, "0.025quant"]


ggplot(sf::st_as_sf(acs_pharm_data)) + 
  geom_sf(aes(fill = RR_mod3 %>% log()),size = 0.01 ) +
  #scale_fill_viridis_c(option = "A", direction = -1)+
  scale_fill_gradient2(midpoint = 0, low = "blue", mid = "white", high = "red", name = "",na.value = "#FFFFFF")+
  # ggplot2::guides(fill=guide_legend(title = ""))+
 #   theme_bw()+
  theme(axis.text   = element_blank(),
         legend.position = c(0.2,0.8))+

  sf::st_as_sf(acs_pharm_data %>% 
                   mutate(ind = case_when(RR_mod3_HH > 1 & RR_mod3_LL> 1 ~ 2,
                                         RR_mod3_HH < 1 & RR_mod3_LL< 1 ~ 1,
                                         RR_mod3_HH > 1 & RR_mod3_LL< 1 ~ 0),
                         RR_mod3 = ifelse(ind == 0, NA,RR_mod3)) ) %>% 
ggplot() + 
  geom_sf(aes(fill = RR_mod3 %>% log()),size = 0.01 ) +
  #scale_fill_viridis_c(option = "A", direction = -1)+
  scale_fill_gradient2(midpoint = 0, low = "blue", mid = "white", high = "red", name = "",na.value = "#FFFFFF")+
  # ggplot2::guides(fill=guide_legend(title = ""))+
 #   theme_bw()+
  theme(axis.text   = element_blank(),
         legend.position = c(0.2,0.8))

tableby(ind ~pharmacy_count+ E+ RR_mod3+population+ drive_pc +
  public_trans_pc +
  walked_pc +
  work_home_pc +
     
    non_hisp_black_pc +
    non_hisp_white_pc +
    non_hisp_asian_pc +
    hisp_pc +
  
  below_hs_pc +
  college_pc +
  
  age_over_65_pc +
  age_65_under_pc +
  
  unemployment_pc, data = acs_pharm_data %>% 
                   mutate(ind = case_when(RR_mod3_HH > 1 & RR_mod3_LL> 1 ~ "high",
                                         RR_mod3_HH < 1 & RR_mod3_LL< 1 ~ "low",
                                         RR_mod3_HH > 1 & RR_mod3_LL< 1 ~ "average"),
                         RR_mod3 = ifelse(ind == 0, NA,RR_mod3)), control = my_controls_row) %>% 
  summary(test = F)
average (N=1849) high (N=49) low (N=2) Total (N=1900)
pharmacy_count
   N 1849 49 2 1900
   Mean (SD) 0.699 (0.968) 2.612 (2.498) 0.000 (0.000) 0.747 (1.078)
   Median (Q1, Q3) 0.000 (0.000, 1.000) 3.000 (0.000, 5.000) 0.000 (0.000, 0.000) 0.000 (0.000, 1.000)
   Min - Max 0.000 - 5.000 0.000 - 7.000 0.000 - 0.000 0.000 - 7.000
   Missing 0 0 0 0
E
   N 1849 49 2 1900
   Mean (SD) 0.757 (0.321) 0.371 (0.394) 0.887 (0.462) 0.747 (0.329)
   Median (Q1, Q3) 0.718 (0.522, 0.946) 0.380 (0.000, 0.647) 0.887 (0.724, 1.051) 0.710 (0.517, 0.943)
   Min - Max 0.001 - 2.592 0.000 - 1.370 0.561 - 1.214 0.000 - 2.592
   Missing 0 0 0 0
RR_mod3
   N 1849 49 2 1900
   Mean (SD) 1.042 (0.493) 5.749 (2.111) 0.306 (0.023) 1.163 (0.952)
   Median (Q1, Q3) 0.901 (0.699, 1.265) 5.369 (3.707, 7.584) 0.306 (0.298, 0.314) 0.914 (0.701, 1.300)
   Min - Max 0.307 - 3.344 2.695 - 9.550 0.290 - 0.323 0.290 - 9.550
   Missing 0 0 0 0
population
   N 1849 49 2 1900
   Mean (SD) 4460.984 (1891.347) 2186.429 (2318.872) 5228.500 (2721.654) 4403.133 (1936.932)
   Median (Q1, Q3) 4232.000 (3078.000, 5572.000) 2237.000 (0.000, 3809.000) 5228.500 (4266.250, 6190.750) 4183.000 (3043.000, 5555.750)
   Min - Max 8.000 - 15271.000 0.000 - 8072.000 3304.000 - 7153.000 0.000 - 15271.000
   Missing 0 0 0 0
drive_pc
   N 1845 28 2 1875
   Mean (SD) 0.868 (0.108) 0.816 (0.167) 0.906 (0.007) 0.868 (0.109)
   Median (Q1, Q3) 0.900 (0.837, 0.936) 0.906 (0.690, 0.938) 0.906 (0.904, 0.909) 0.900 (0.836, 0.936)
   Min - Max 0.280 - 1.000 0.422 - 1.000 0.901 - 0.912 0.280 - 1.000
   Missing 4 21 0 25
public_trans_pc
   N 1845 28 2 1875
   Mean (SD) 0.042 (0.071) 0.055 (0.097) 0.053 (0.021) 0.042 (0.071)
   Median (Q1, Q3) 0.013 (0.000, 0.052) 0.010 (0.000, 0.053) 0.053 (0.046, 0.061) 0.013 (0.000, 0.052)
   Min - Max 0.000 - 0.556 0.000 - 0.399 0.039 - 0.068 0.000 - 0.556
   Missing 4 21 0 25
walked_pc
   N 1845 28 2 1875
   Mean (SD) 0.025 (0.051) 0.062 (0.092) 0.013 (0.018) 0.026 (0.052)
   Median (Q1, Q3) 0.010 (0.001, 0.026) 0.023 (0.004, 0.057) 0.013 (0.006, 0.019) 0.010 (0.001, 0.027)
   Min - Max 0.000 - 0.548 0.000 - 0.362 0.000 - 0.026 0.000 - 0.548
   Missing 4 21 0 25
work_home_pc
   N 1845 28 2 1875
   Mean (SD) 0.047 (0.034) 0.046 (0.027) 0.025 (0.006) 0.047 (0.034)
   Median (Q1, Q3) 0.041 (0.022, 0.064) 0.044 (0.030, 0.062) 0.025 (0.022, 0.027) 0.041 (0.022, 0.064)
   Min - Max 0.000 - 0.439 0.000 - 0.110 0.020 - 0.029 0.000 - 0.439
   Missing 4 21 0 25
non_hisp_black_pc
   N 1849 28 2 1879
   Mean (SD) 0.202 (0.215) 0.115 (0.103) 0.145 (0.101) 0.201 (0.214)
   Median (Q1, Q3) 0.128 (0.050, 0.272) 0.088 (0.038, 0.145) 0.145 (0.109, 0.180) 0.127 (0.049, 0.272)
   Min - Max 0.000 - 0.973 0.000 - 0.420 0.074 - 0.216 0.000 - 0.973
   Missing 0 21 0 21
non_hisp_white_pc
   N 1849 28 2 1879
   Mean (SD) 0.625 (0.242) 0.758 (0.166) 0.386 (0.249) 0.627 (0.241)
   Median (Q1, Q3) 0.662 (0.466, 0.820) 0.780 (0.668, 0.870) 0.386 (0.298, 0.474) 0.664 (0.469, 0.822)
   Min - Max 0.000 - 1.000 0.410 - 1.000 0.210 - 0.562 0.000 - 1.000
   Missing 0 21 0 21
non_hisp_asian_pc
   N 1849 28 2 1879
   Mean (SD) 0.056 (0.080) 0.048 (0.068) 0.081 (0.071) 0.056 (0.079)
   Median (Q1, Q3) 0.024 (0.004, 0.077) 0.019 (0.006, 0.069) 0.081 (0.056, 0.106) 0.024 (0.004, 0.077)
   Min - Max 0.000 - 0.613 0.000 - 0.252 0.031 - 0.131 0.000 - 0.613
   Missing 0 21 0 21
hisp_pc
   N 1849 28 2 1879
   Mean (SD) 0.083 (0.096) 0.051 (0.047) 0.343 (0.243) 0.083 (0.096)
   Median (Q1, Q3) 0.052 (0.022, 0.106) 0.038 (0.014, 0.069) 0.343 (0.257, 0.428) 0.052 (0.022, 0.106)
   Min - Max 0.000 - 0.766 0.000 - 0.170 0.171 - 0.514 0.000 - 0.766
   Missing 0 21 0 21
below_hs_pc
   N 1849 28 2 1879
   Mean (SD) 0.487 (0.055) 0.461 (0.108) 0.498 (0.002) 0.486 (0.056)
   Median (Q1, Q3) 0.484 (0.461, 0.506) 0.477 (0.439, 0.507) 0.498 (0.497, 0.498) 0.484 (0.461, 0.506)
   Min - Max 0.252 - 1.000 0.000 - 0.649 0.496 - 0.499 0.000 - 1.000
   Missing 0 21 0 21
college_pc
   N 1849 28 2 1879
   Mean (SD) 0.260 (0.067) 0.221 (0.079) 0.332 (0.005) 0.259 (0.067)
   Median (Q1, Q3) 0.274 (0.223, 0.307) 0.243 (0.193, 0.277) 0.332 (0.330, 0.333) 0.273 (0.223, 0.307)
   Min - Max 0.000 - 0.403 0.000 - 0.327 0.328 - 0.335 0.000 - 0.403
   Missing 0 21 0 21
age_over_65_pc
   N 1849 28 2 1879
   Mean (SD) 0.031 (0.015) 0.016 (0.012) 0.085 (0.008) 0.031 (0.015)
   Median (Q1, Q3) 0.030 (0.021, 0.041) 0.014 (0.009, 0.021) 0.085 (0.082, 0.088) 0.030 (0.021, 0.041)
   Min - Max 0.000 - 0.079 0.000 - 0.057 0.079 - 0.091 0.000 - 0.091
   Missing 0 21 0 21
age_65_under_pc
   N 1849 28 2 1879
   Mean (SD) 0.969 (0.015) 0.984 (0.012) 0.915 (0.008) 0.969 (0.015)
   Median (Q1, Q3) 0.970 (0.959, 0.979) 0.986 (0.979, 0.991) 0.915 (0.912, 0.918) 0.970 (0.959, 0.979)
   Min - Max 0.921 - 1.000 0.943 - 1.000 0.909 - 0.921 0.909 - 1.000
   Missing 0 21 0 21
unemployment_pc
   N 1845 28 2 1875
   Mean (SD) 0.058 (0.039) 0.059 (0.037) 0.047 (0.031) 0.058 (0.039)
   Median (Q1, Q3) 0.050 (0.032, 0.074) 0.052 (0.034, 0.075) 0.047 (0.036, 0.057) 0.050 (0.032, 0.074)
   Min - Max 0.000 - 0.327 0.000 - 0.166 0.025 - 0.068 0.000 - 0.327
   Missing 4 21 0 25

Clustering of social variables

set.seed(123)

avg_access_data = acs_pharm_data %>% 
                   mutate(ind = case_when(RR_mod1_HH > 1 & RR_mod1_LL> 1 ~ 2,
                                         RR_mod1_HH < 1 & RR_mod1_LL< 1 ~ 1,
                                         RR_mod1_HH > 1 & RR_mod1_LL< 1 ~ 0),
                         RR_mod1 = ifelse(ind == 0, NA,RR_mod1)) %>% 
  filter(ind == 0)

# only using 1,850 census tracts after removing NA
clusters = kmeans(avg_access_data %>% sf::st_drop_geometry() %>% 
         select(population, drive_pc ,
  public_trans_pc ,
  walked_pc ,
  work_home_pc ,
     
    non_hisp_black_pc ,
    non_hisp_white_pc ,
    non_hisp_asian_pc ,
    hisp_pc ,
  
  below_hs_pc ,
  college_pc ,
  
  age_over_65_pc ,
  age_65_under_pc ,
  
  unemployment_pc ) %>%mutate(across(everything(), ~ replace_na(., mean(., na.rm = TRUE)))), 2, iter.max = 10, nstart = 1)

avg_access_data$cluster = clusters$cluster

tableby(cluster~population+ drive_pc +
  public_trans_pc +
  walked_pc +
  work_home_pc +
     
    non_hisp_black_pc +
    non_hisp_white_pc +
    non_hisp_asian_pc +
    hisp_pc +
  
  below_hs_pc +
  college_pc +
  
  age_over_65_pc +
  age_65_under_pc +
  
  unemployment_pc ,  data = avg_access_data, control = my_controls_row ) %>% summary(text = TRUE, test = F)%>% 
  knitr::kable() %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::scroll_box(width = "100%", height = "200px")
1 (N=772) 2 (N=1103) Total (N=1875)
population
  • N
772 1103 1875
  • Mean (SD)
6252.554 (1357.947) 3115.024 (1032.019) 4406.850 (1941.796)
  • Median (Q1, Q3)
5906.500 (5286.000, 6857.500) 3250.000 (2459.500, 3934.000) 4186.000 (3038.000, 5562.500)
  • Min - Max
4685.000 - 15271.000 0.000 - 4678.000 0.000 - 15271.000
  • Missing
0 0 0
drive_pc
  • N
772 1078 1850
  • Mean (SD)
0.876 (0.095) 0.863 (0.118) 0.868 (0.109)
  • Median (Q1, Q3)
0.902 (0.849, 0.933) 0.899 (0.829, 0.939) 0.900 (0.836, 0.936)
  • Min - Max
0.309 - 1.000 0.280 - 1.000 0.280 - 1.000
  • Missing
0 25 25
public_trans_pc
  • N
772 1078 1850
  • Mean (SD)
0.038 (0.063) 0.044 (0.076) 0.042 (0.071)
  • Median (Q1, Q3)
0.013 (0.000, 0.050) 0.013 (0.000, 0.054) 0.013 (0.000, 0.052)
  • Min - Max
0.000 - 0.467 0.000 - 0.556 0.000 - 0.556
  • Missing
0 25 25
walked_pc
  • N
772 1078 1850
  • Mean (SD)
0.022 (0.045) 0.028 (0.055) 0.025 (0.051)
  • Median (Q1, Q3)
0.009 (0.003, 0.023) 0.011 (0.001, 0.028) 0.010 (0.001, 0.026)
  • Min - Max
0.000 - 0.487 0.000 - 0.548 0.000 - 0.548
  • Missing
0 25 25
work_home_pc
  • N
772 1078 1850
  • Mean (SD)
0.048 (0.030) 0.046 (0.037) 0.047 (0.034)
  • Median (Q1, Q3)
0.044 (0.026, 0.065) 0.040 (0.019, 0.063) 0.041 (0.022, 0.064)
  • Min - Max
0.000 - 0.182 0.000 - 0.439 0.000 - 0.439
  • Missing
0 25 25
non_hisp_black_pc
  • N
772 1082 1854
  • Mean (SD)
0.173 (0.172) 0.223 (0.239) 0.202 (0.215)
  • Median (Q1, Q3)
0.115 (0.052, 0.236) 0.141 (0.047, 0.299) 0.128 (0.050, 0.272)
  • Min - Max
0.000 - 0.914 0.000 - 0.973 0.000 - 0.973
  • Missing
0 21 21
non_hisp_white_pc
  • N
772 1082 1854
  • Mean (SD)
0.626 (0.227) 0.624 (0.252) 0.625 (0.242)
  • Median (Q1, Q3)
0.659 (0.469, 0.809) 0.669 (0.466, 0.830) 0.662 (0.466, 0.821)
  • Min - Max
0.035 - 0.997 0.000 - 1.000 0.000 - 1.000
  • Missing
0 21 21
non_hisp_asian_pc
  • N
772 1082 1854
  • Mean (SD)
0.068 (0.084) 0.048 (0.075) 0.056 (0.080)
  • Median (Q1, Q3)
0.034 (0.010, 0.093) 0.016 (0.003, 0.059) 0.024 (0.004, 0.077)
  • Min - Max
0.000 - 0.613 0.000 - 0.543 0.000 - 0.613
  • Missing
0 21 21
hisp_pc
  • N
772 1082 1854
  • Mean (SD)
0.099 (0.110) 0.072 (0.084) 0.083 (0.097)
  • Median (Q1, Q3)
0.061 (0.029, 0.120) 0.045 (0.018, 0.097) 0.052 (0.022, 0.106)
  • Min - Max
0.000 - 0.766 0.000 - 0.699 0.000 - 0.766
  • Missing
0 21 21
below_hs_pc
  • N
772 1082 1854
  • Mean (SD)
0.486 (0.042) 0.486 (0.064) 0.486 (0.056)
  • Median (Q1, Q3)
0.483 (0.464, 0.504) 0.484 (0.458, 0.509) 0.484 (0.461, 0.506)
  • Min - Max
0.377 - 0.798 0.000 - 1.000 0.000 - 1.000
  • Missing
0 21 21
college_pc
  • N
772 1082 1854
  • Mean (SD)
0.271 (0.060) 0.251 (0.071) 0.259 (0.068)
  • Median (Q1, Q3)
0.281 (0.241, 0.311) 0.266 (0.211, 0.303) 0.273 (0.223, 0.307)
  • Min - Max
0.000 - 0.403 0.000 - 0.401 0.000 - 0.403
  • Missing
0 21 21
age_over_65_pc
  • N
772 1082 1854
  • Mean (SD)
0.033 (0.014) 0.030 (0.015) 0.031 (0.015)
  • Median (Q1, Q3)
0.033 (0.024, 0.041) 0.029 (0.019, 0.041) 0.030 (0.021, 0.041)
  • Min - Max
0.000 - 0.079 0.000 - 0.091 0.000 - 0.091
  • Missing
0 21 21
age_65_under_pc
  • N
772 1082 1854
  • Mean (SD)
0.967 (0.014) 0.970 (0.015) 0.969 (0.015)
  • Median (Q1, Q3)
0.967 (0.959, 0.976) 0.971 (0.959, 0.981) 0.970 (0.959, 0.979)
  • Min - Max
0.921 - 1.000 0.909 - 1.000 0.909 - 1.000
  • Missing
0 21 21
unemployment_pc
  • N
772 1078 1850
  • Mean (SD)
0.054 (0.030) 0.061 (0.044) 0.058 (0.039)
  • Median (Q1, Q3)
0.047 (0.033, 0.068) 0.052 (0.032, 0.080) 0.050 (0.032, 0.074)
  • Min - Max
0.000 - 0.231 0.000 - 0.327 0.000 - 0.327
  • Missing
0 25 25
high_access_data = acs_pharm_data %>% 
                   mutate(ind = case_when(RR_mod1_HH > 1 & RR_mod1_LL> 1 ~ 2,
                                         RR_mod1_HH < 1 & RR_mod1_LL< 1 ~ 1,
                                         RR_mod1_HH > 1 & RR_mod1_LL< 1 ~ 0),
                         RR_mod1 = ifelse(ind == 0, NA,RR_mod1)) %>% 
  filter(ind == 2)

# only using 1,850 census tracts after removing NA
clusters_high = kmeans(high_access_data %>% sf::st_drop_geometry() %>% 
         select(population, drive_pc ,
  public_trans_pc ,
  walked_pc ,
  work_home_pc ,
     
    non_hisp_black_pc ,
    non_hisp_white_pc ,
    non_hisp_asian_pc ,
    hisp_pc ,
  
  below_hs_pc ,
  college_pc ,
  
  age_over_65_pc ,
  age_65_under_pc ,
  
  unemployment_pc ) %>%mutate(across(everything(), ~ replace_na(., mean(., na.rm = TRUE)))), 2, iter.max = 10, nstart = 1)

high_access_data$cluster = clusters_high$cluster

tableby(cluster~population+ drive_pc +
  public_trans_pc +
  walked_pc +
  work_home_pc +
     
    non_hisp_black_pc +
    non_hisp_white_pc +
    non_hisp_asian_pc +
    hisp_pc +
  
  below_hs_pc +
  college_pc +
  
  age_over_65_pc +
  age_65_under_pc +
  
  unemployment_pc ,  data = high_access_data, control = my_controls_row ) %>%
  summary(text = TRUE, test = F)%>% 
  knitr::kable() %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::scroll_box(width = "100%", height = "200px")
1 (N=18) 2 (N=7) Total (N=25)
population
  • N
18 7 25
  • Mean (SD)
3364.778 (802.742) 6077.429 (1184.795) 4124.320 (1533.838)
  • Median (Q1, Q3)
3547.000 (2836.500, 3903.250) 5896.000 (5136.500, 6718.000) 3862.000 (3094.000, 4865.000)
  • Min - Max
1631.000 - 4454.000 4865.000 - 8072.000 1631.000 - 8072.000
  • Missing
0 0 0
drive_pc
  • N
18 7 25
  • Mean (SD)
0.854 (0.140) 0.781 (0.166) 0.833 (0.148)
  • Median (Q1, Q3)
0.916 (0.777, 0.935) 0.867 (0.633, 0.906) 0.910 (0.728, 0.936)
  • Min - Max
0.480 - 0.977 0.561 - 0.958 0.480 - 0.977
  • Missing
0 0 0
public_trans_pc
  • N
18 7 25
  • Mean (SD)
0.042 (0.072) 0.034 (0.080) 0.040 (0.073)
  • Median (Q1, Q3)
0.007 (0.000, 0.056) 0.004 (0.000, 0.009) 0.006 (0.000, 0.011)
  • Min - Max
0.000 - 0.247 0.000 - 0.214 0.000 - 0.247
  • Missing
0 0 0
walked_pc
  • N
18 7 25
  • Mean (SD)
0.040 (0.052) 0.101 (0.152) 0.057 (0.092)
  • Median (Q1, Q3)
0.023 (0.008, 0.044) 0.017 (0.007, 0.158) 0.023 (0.005, 0.045)
  • Min - Max
0.000 - 0.197 0.000 - 0.362 0.000 - 0.362
  • Missing
0 0 0
work_home_pc
  • N
18 7 25
  • Mean (SD)
0.044 (0.025) 0.068 (0.035) 0.051 (0.030)
  • Median (Q1, Q3)
0.038 (0.029, 0.062) 0.058 (0.048, 0.089) 0.045 (0.031, 0.069)
  • Min - Max
0.000 - 0.095 0.025 - 0.119 0.000 - 0.119
  • Missing
0 0 0
non_hisp_black_pc
  • N
18 7 25
  • Mean (SD)
0.120 (0.118) 0.104 (0.048) 0.116 (0.102)
  • Median (Q1, Q3)
0.073 (0.036, 0.171) 0.087 (0.076, 0.132) 0.087 (0.046, 0.162)
  • Min - Max
0.004 - 0.420 0.048 - 0.178 0.004 - 0.420
  • Missing
0 0 0
non_hisp_white_pc
  • N
18 7 25
  • Mean (SD)
0.760 (0.162) 0.729 (0.135) 0.751 (0.153)
  • Median (Q1, Q3)
0.778 (0.650, 0.862) 0.764 (0.712, 0.808) 0.764 (0.703, 0.846)
  • Min - Max
0.410 - 0.984 0.448 - 0.854 0.410 - 0.984
  • Missing
0 0 0
non_hisp_asian_pc
  • N
18 7 25
  • Mean (SD)
0.041 (0.061) 0.066 (0.076) 0.048 (0.065)
  • Median (Q1, Q3)
0.019 (0.005, 0.043) 0.044 (0.017, 0.082) 0.025 (0.007, 0.079)
  • Min - Max
0.000 - 0.252 0.000 - 0.222 0.000 - 0.252
  • Missing
0 0 0
hisp_pc
  • N
18 7 25
  • Mean (SD)
0.049 (0.043) 0.073 (0.048) 0.056 (0.045)
  • Median (Q1, Q3)
0.039 (0.021, 0.063) 0.048 (0.042, 0.084) 0.041 (0.027, 0.066)
  • Min - Max
0.000 - 0.151 0.038 - 0.170 0.000 - 0.170
  • Missing
0 0 0
below_hs_pc
  • N
18 7 25
  • Mean (SD)
0.487 (0.063) 0.509 (0.073) 0.493 (0.065)
  • Median (Q1, Q3)
0.487 (0.450, 0.514) 0.485 (0.460, 0.536) 0.485 (0.453, 0.515)
  • Min - Max
0.375 - 0.649 0.445 - 0.638 0.375 - 0.649
  • Missing
0 0 0
college_pc
  • N
18 7 25
  • Mean (SD)
0.253 (0.045) 0.224 (0.093) 0.245 (0.061)
  • Median (Q1, Q3)
0.263 (0.205, 0.292) 0.210 (0.171, 0.275) 0.253 (0.196, 0.292)
  • Min - Max
0.190 - 0.327 0.100 - 0.363 0.100 - 0.363
  • Missing
0 0 0
age_over_65_pc
  • N
18 7 25
  • Mean (SD)
0.024 (0.018) 0.028 (0.023) 0.025 (0.019)
  • Median (Q1, Q3)
0.020 (0.014, 0.029) 0.016 (0.014, 0.039) 0.019 (0.014, 0.029)
  • Min - Max
0.000 - 0.069 0.006 - 0.066 0.000 - 0.069
  • Missing
0 0 0
age_65_under_pc
  • N
18 7 25
  • Mean (SD)
0.976 (0.018) 0.972 (0.023) 0.975 (0.019)
  • Median (Q1, Q3)
0.980 (0.971, 0.986) 0.984 (0.961, 0.986) 0.981 (0.971, 0.986)
  • Min - Max
0.931 - 1.000 0.934 - 0.994 0.931 - 1.000
  • Missing
0 0 0
unemployment_pc
  • N
18 7 25
  • Mean (SD)
0.068 (0.040) 0.057 (0.027) 0.065 (0.037)
  • Median (Q1, Q3)
0.063 (0.039, 0.083) 0.053 (0.037, 0.074) 0.063 (0.036, 0.082)
  • Min - Max
0.016 - 0.166 0.023 - 0.100 0.016 - 0.166
  • Missing
0 0 0